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S6 
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US-PGPUB; 
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DERWENT; 
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ON 


2006/02/14 15:40 
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* ABSTRACT 



Finding pages on the Web that are similar to a query page (Related Pages) is an important 
component of modern search engines. A variety of strategies have been proposed for answering 
Related Pages queries, but comparative evaluation by user studies is expensive, especially when 
large strategy spaces must be searched (e.g., when tuning parameters). We present a technique for 
automatically evaluating strategies using Web hierarchies, such as Open Directory, in place of user 
feedback. We apply this evaluation methodology to a mix of document representation strategies, 
including the use of text, anchor-text, and links. We discuss the relative advantages and 
disadvantages of the various approaches examined. Finally, we describe how to efficiently construct a 
similarity index out of our chosen strategies, and provide sample results from our index. 
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